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ABSTRACT 


Tower-based  surveillance  systems  have  been  employed  by  the  U.S.  military  to  enhance 
intelligence,  surveillance,  and  reconnaissance  capabilities  in  Iraq  and  Afghanistan.  We 
consider  a  scenario  wherein  two  surveillance  towers  are  installed  in  separate  locations; 
however,  the  surveillance  team  does  not  have  enough  operators  to  operate  both  towers  to 
their  capacity.  Two  strategies  can  be  used  to  operate  these  two  towers:  stationary 
allocation  and  dynamic  allocation.  We  formulate  a  two-person  nonzero-sum  game  to 
analyze  these  strategies,  in  which  the  surveillance  team  wants  to  maintain  regional 
stability  while  insurgents  carry  out  attacks  to  disrupt  it. 

Our  analysis  suggests  that  the  dynamic  allocation  strategy  can  improve  the 
performance  of  surveillance  towers  over  stationary  allocation  under  most  circumstances. 
The  improvement  tends  to  be  more  significant  when  the  surveillance  team  has  more 
surveillance  resource.  The  dynamic  allocation  tends  to  be  less  effective  when  (1)  a 
detected  attack  has  a  smaller  negative  impact  on  the  insurgent  operations,  or  when  (2)  a 
detected  attack  brings  a  larger  immediate  benefit  to  the  surveillance  team. 
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EXECUTIVE  SUMMARY 


Dominant  intelligence,  surveillance,  and  reconnaissance  capability  is  one  of  the  key 
enablers  in  irregular  warfare.  This  thesis  is  motivated  by  the  deployment  of  Ground- 
Based  Operational  Surveillance  System — a  24-hour  all-weather  tower-based  surveillance 
system — to  enhance  situation  awareness  in  Iraq  and  Afghanistan  in  the  late  2000s.  When 
the  number  of  operators  is  not  enough  to  staff  all  surveillance  towers,  we  examined 
whether  it  is  helpful  to  dynamically  move  the  operators  between  them,  in  the  hope  that  an 
understaffed  surveillance  tower  can  still  deter  insurgency  activities. 

We  considered  the  following  scenario:  two  surveillance  towers  are  installed  in 
separate  locations.  However,  the  surveillance  team  does  not  have  enough  operators  to 
operate  both  towers  to  their  capacity.  We  compared  two  strategies:  stationary  allocation 
and  dynamic  allocation.  With  stationary  allocation,  the  team  splits  up  so  that  each  tower 
is  partially  operational;  with  dynamic  allocation,  the  team  moves  back  and  forth  between 
the  two  towers  at  random  intervals.  We  formulated  a  two-person  nonzero-sum  game,  in 
which  the  surveillance  team  wants  to  maintain  regional  stability  while  the  insurgents 
carry  out  attacks  to  disrupt  it. 

Our  analysis  suggests  that  the  dynamic  allocation  strategy  can  improve  the 
performance  of  the  surveillance  towers  over  stationary  allocation  under  most 
circumstances.  The  improvement  tends  to  be  more  significant  when  the  surveillance  team 
has  more  surveillance  resource.  The  dynamic  allocation  tends  to  be  less  effective  when 

( 1 )  a  detected  attack  has  a  smaller  negative  impact  on  the  insurgent  operations,  or  when 

(2)  a  detected  attack  brings  a  larger  immediate  benefit  to  the  surveillance  team.  These 
findings  can  provide  suggestions  for  decision  makers  in  allocating  resources  to  enhance 
ISR  capabilities  on  the  battlefield. 
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I.  INTRODUCTION 


A.  BACKGROUND 

1.  Intelligence,  Surveillance,  and  Reconnaissance 

In  modem  warfare,  military  forces  seek  not  only  to  advance  weapon  systems,  but 
also  to  enhance  their  capability  in  intelligence,  surveillance,  and  reconnaissance  (ISR). 
According  to  Joint  Publication  1-02,  ISR  is  an  activity  that  synchronizes  and  integrates 
the  planning  and  operation  of  sensors  and  assets,  and  processing,  exploitation,  and 
dissemination  systems  in  direct  support  of  current  and  future  operations  [1].  ISR  is  an 
integrated  intelligence  and  operations  function.  For  U.S.  military  operations  in  Iraq  and 
Afghanistan,  ISR  capabilities  are  more  critical  than  ever  before,  due  to  the  nature  of 
insurgent  activities.  Since  insurgents  can  blend  in  easily  with  non-combatant  citizens,  an 
ability  to  dominate  ISR  on  the  battlefield  is  critical.  A  great  deal  of  effort  has  been 
exerted  to  enhance  ISR  capabilities.  For  example,  unmanned  aerial  vehicles  and  Ground- 
Based  Operational  Surveillance  Systems  (G-BOSS)  have  been  deployed  for  some  time 
with  many  documented  success  stories.  Generally  speaking,  these  systems  provide 
surveillance  and  reconnaissance  on  the  battlefield  by  collecting  video  and  audio 
intelligence  to  enhance  the  commander's  situational  awareness  on  the  battlefield.  This 
thesis  focuses  on  G-BOSS. 

2.  Ground-Based  Operational  Surveillance  System 

G-BOSS  is  a  tower-based  surveillance  system  derived  from  the  sensor  suite 
utilized  on  the  Rapid  Aerostat  Initial  Deployment,  as  shown  in  Figure  1.  This  system 
consists  of  four  major  assemblies: 

•  Cameras,  including  one  primary  infrared  camera  (FLIR  T-3000),  and  one 
electro-optical  infrared  camera  (FLIR  Star  S AFIRE  IIIFP). 

•  One  mobile  tower  (approximately  107  feet  tall). 

•  One  Man-Portable  Surveillance  and  Target  Acquisition  Radar  (MSTAR). 

•  One  Ground  Control  Station  (GCS). 
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Currently,  the  United  States  Marine  Corps  (USMC)  deploys  G-BOSS  to  Iraq  and 
Afghanistan  to  enhance  ISR  capabilities.  The  USMC  awarded  the  initial  $60  million  G- 
BOSS  contract  to  Raytheon  on  April  9,  2008.  The  goal  is  to  use  these  surveillance 
systems  to  detect  and  disrupt  insurgent  activities.  According  to  a  news  release  from  the 
Quantico  Sentry  [2],  G-BOSS  is  being  deployed  in  four  phases. 

1.  Phase  One  is  deployment  of  G-BOSS  to  coalition  outposts.  During  this  phase, 
the  system  is  operated  manually  at  the  base  of  each  tower,  with  radio 
communications  to  the  Combat  Operation  Center  (COC)  as  shown  in  Figure  2. 

2.  During  Phase  Two,  G-BOSS  is  operated  but  the  data/information  is 
automatically  fed  to  the  COC. 

3.  During  Phase  Three,  G-BOSS  is  controlled  from  within  the  COC,  with 
automatic  slewing  or  rotating  capabilities.  During  this  phase,  video  storage 
capabilities  are  integrated. 

4.  By  Phase  Four,  the  surveillance  crew  in  charge  of  monitoring  G-BOSS  can 
track  not  only  what  is  happening  in  their  own  region,  but  also  that  of  the  entire 
province  through  an  integrated  network. 

As  this  system  is  phased  in,  more  monitors  will  be  installed  in  COCs  for 
consistent  surveillance.  It  will  thus  be  necessary  to  increase  the  number  of  operators  to 
staff  all  systems  in  order  to  have  better  surveillance  results.  Increasing  the  number  of 
operators,  however,  is  usually  a  carefully  considered  constraint  in  combat  situations.  If  it 
is  not  possible  to  increase  the  number  of  operators,  the  workload  of  current  operators  will 
then  increase  accordingly.  According  to  Parasuraman  and  Mouloua  [3],  "the  most 
significant  factor  that  may  influence  the  accuracy  of  monitoring  under  automation  is  task 
loads  imposed  on  the  operator."  What  can  be  done  to  make  the  best  use  of  G-BOSS  when 
facing  a  manpower  constraint? 

This  thesis  explores  the  idea  of  assigning  operators  in  a  dynamic  manner.  Instead 
of  assigning  a  single  operator  to  one  G-BOSS,  the  operator  is  moved  among  systems 
from  time  to  time.  Because  the  tower-based  surveillance  system  is  prominent  in  the  areas 
where  it  is  installed,  the  tower  may  still  produce  a  deterrent  effect  for  insurgents,  even  if 
it  is  not  actively  being  monitored.  With  that  idea  in  mind,  one  group  of  operators  can  be 
assigned  to  shift  their  attention  back  and  forth  between  towers;  the  tower  without 
operators  will  serve  as  a  decoy.  The  objective  in  this  thesis  is  to  use  mathematical  models 
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to  determine  whether  dynamic  allocation  of  manpower  can  improve  the  performance  of 
multiple  systems,  and  whether  a  decoy  tower  can  provide  a  deterrent  effect  with 
insurgents. 


Figure  1.  Top  of  surveillance  tower 
(From:  FLIR  http://www.gs.flir.com/datasheets/land.cfim) 


Figure  2.  G-BOSS  control  room  (From:  Quantico  Sentry) 
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B.  OBJECTIVE 

The  goal  of  this  thesis  is  to  determine  whether  it  is  helpful  to  dynamically  move 
manpower  between  surveillance  towers  when  manpower  is  limited.  The  following 
scenario  is  considered:  two  surveillance  towers  are  installed  in  two  desired  locations. 
However,  there  is  only  one  surveillance  team  to  operate  one  tower  at  its  full  capacity.  It  is 
possible  either  to  move  the  surveillance  team  back  and  forth  between  the  two  towers,  or 
to  split  the  team  so  that  each  tower  can  be  partially  operational.  The  thesis  develops 
mathematical  models  to  study  these  two  strategies.  The  findings  in  this  thesis  can  provide 
suggestions  for  decision  makers  while  employing  surveillance  systems  on  the  battlefield. 

C.  RELATED  WORKS 

A  significant  amount  of  work  has  been  done  to  improve  the  performance  and 
effectiveness  of  surveillance  systems  in  ISR  and  perimeter  protection.  The  work  can  be 
divided  into  three  categories. 

First,  from  the  perspective  of  technology,  the  advances  of  cameras  have  had  a 
significant  influence  on  system  performance,  particularly  the  combination  of  a  high- 
resolution  charge-coupled  device  and  electro-optics  with  an  infrared  sensor  system  [4]. 
With  a  longer  surveillance  range,  higher  image  resolution,  and  information  integration,  a 
camera  could  remotely  monitor  an  adversary’s  activity  day  and  night.  These  new 
surveillance  technologies  not  only  mitigate  false  detection  rates,  but  also  help  reduce 
crew  requirements. 

Second,  there  is  a  stream  of  work  that  uses  mathematical  modeling  and 
optimization  to  improve  surveillance  results.  Szechtman  et  al.  [5]  used  mathematical 
models  to  analyze  optimal  strategies  for  a  moving  surveillance  sensor  to  detect  infiltrators 
on  a  border.  Midgette  [6]  proposed  an  agent-based  simulation  model  to  elevate  the 
operational  effectiveness  of  G-BOSS  as  guidance  for  system  fielding.  Also,  William  [7] 
carried  out  a  surveillance  and  interdiction  model  with  a  game -theoretic  approach  to  fight 
against  vehicle-borne  improvised  explosive  devices. 
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Third,  there  are  studies  that  compare  the  frequency  of  criminal  activity  before  and 
after  installation  of  surveillance  monitors.  Gill  and  Spriggs  [8]  summarized  their  research 
on  the  impact  of  using  closed-circuit  television  (CCTV)  in  different  cities  throughout 
Great  Britain  thusly: 

The  use  of  CCTV  needs  to  be  supported  by  a  strategy  outlining  the 
objectives  of  the  system  and  how  these  will  be  fulfilled.  This  needs  to  take 
account  of  local  crime  problems  and  prevention  measures  already  in  place. 

Welsh  and  Farrington  [9]  concluded  their  research  about  using  CCTV  in  crime 
prevention  as  follows: 

Overall,  it  might  be  concluded  that  CCTV  reduces  crime  to  a  small  degree. 

In  light  of  the  successful  results,  future  CCTV  schemes  should  be 
carefully  implemented  in  different  settings  and  should  employ  high  quality 
evaluation  designs  with  long  follow-up  periods. 

Conclusions  from  previous  research  indicates  that  having  appropriate  surveillance 
equipment  is  a  key  enabler  toward  better  detection  results  in  a  surveillance  plan,  which 
corresponds  to  the  objective  of  this  thesis. 

D.  THESIS  ORGANIZATION 

The  rest  of  this  thesis  is  organized  as  follows.  Chapter  II  formulates  a  two-person 
nonzero-sum  game  to  model  the  interaction  between  coalition  forces  and  insurgents. 
Coalition  forces  assign  manpower  between  two  surveillance  towers,  while  insurgents 
launch  attacks  in  order  to  interrupt  regional  stability.  In  Chapter  III,  numerical  analysis  is 
carried  out  to  demonstrate  the  model.  Situations  are  identified  in  which  it  is  helpful  to 
dynamically  allocate  manpower  between  the  two  surveillance  towers.  Finally,  Chapter  IV 
presents  findings  and  suggests  future  research  directions. 


5 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


6 


II.  METHODOLOGY 


A.  MODEL 

Consider  a  situation  in  which  Blue  has  established  military  bases  in  two  towns. 
Blue’s  goal  is  to  maintain  peace  and  eliminate  insurgent  activities  in  these  two  towns. 
(From  now  on  insurgent  activities  will  be  referred  to  as  attacks  for  brevity.)  Blue  has  one 
surveillance  tower  set  up  in  each  base,  but  Blue  cannot  detect  all  attacks  in  both  towns  at 
all  times,  due  to  a  lack  of  resources  (manpower,  equipment,  etc.).  Denote  by  5  (s  <  2  )  the 
total  resources  available  to  Blue,  such  that  Blue  can  allocate  detection  probability  p.  to 

tower  i ,  as  long  as  px  +  p2  <  s  and  0  <  Pi  ^  1 ,  for  i  =  1,2  .  The  problem  facing  Blue  is 

how  to  allocate  s  between  the  two  surveillance  towers. 

In  each  town,  an  insurgent  group  attempts  to  carry  out  attacks  for  its  own  gain. 
The  insurgent  group  operating  in  town  i  is  referred  to  as  Red  i,  for  /  =  1,2  .  Red  1  and  Red 
2  operate  independently  from  each  other.  For  each  Red  team,  the  status  quo  is  not  to 
attack,  in  which  case  neither  Red  nor  Blue  receives  a  reward  or  a  penalty.  If  a  Red  team 
launches  an  attack,  there  are  two  possible  outcomes:  either  the  attack  is  detected  by 
Blue’s  surveillance  tower,  or  it  is  not.  The  Red  team  earns  reward  of  +1  for  each 
undetected  attack,  and  incurs  a  penalty  r  >  0  (reward  —r  )  for  each  detected  attack. 
Because  Blue’s  goal  is  to  maintain  peace  and  ideally  to  eliminate  attacks  altogether,  there 
is  a  penalty  for  each  attack  regardless  of  whether  or  not  the  attack  is  detected.  However, 
detecting  an  attack  is  better  than  not  detecting  it,  so  Blue  incurs  a  penalty  1  (reward  -1 ) 
for  an  undetected  attack  and  a  smaller  penalty  b  e  (0, 1)  (reward  -b )  for  a  detected  attack. 
Table  1  summarizes  the  reward  for  Blue  and  each  Red  team,  respectively. 

We  model  the  interaction  between  Blue  and  two  Red  teams  as  a  nonzero-sum 
game,  where  Blue  moves  first,  and  then  each  Red  team  moves  second,  independently, 
after  observing  Blue's  strategy.  The  objective  of  each  player  is  to  maximize  his  own  long- 
run  average  reward. 
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No  attack 

Attack 

undetected 

Attack 

detected 

Red 

0 

+1 

-r 

Blue 

0 

-1 

-b 

Table  1.  Reward  table 


If  the  detection  probability  is  p  in  a  town,  Blue’s  expected  reward  for  each  attack 
is 

(-1X1 2 3  -p)  +  (~b)p  =  -l  +  (l-b)p,  (1) 

and  Red’s  expected  reward  for  each  attack  is 

(+l)(l-p)  +  (-r)p  =  l-(l  +  r)p.  (2) 

By  setting  Equation  (2)  to  0,  we  can  solve 


If  p>  p,  Equation  (2)  is  negative,  so  it  is  optimal  for  a  Red  team  to  shut  down  its 
operation  altogether.  In  the  special  case  when  p  =  p ,  Red’s  expected  reward  for  each 
attack  is  0,  so  Red  feels  indifferent  about  whether  to  attack  or  not.  For  mathematical 
completeness,  however,  assume  that  Red  will  continue  to  attack  if  p  =  p  ,  as  it  gives  Blue 
a  negative  expected  reward. 

Suppose  each  Red  team  can  carry  out  attacks  at  a  maximum  rate  x.  Consider  three 
cases  for  5: 

1.  s  e  (2p,2\ .  If  Blue  allocates  p1  =  p2  =  s  /  2>  p  ,  then  both  Red  teams  will  stop 

their  operations.  The  long-run  reward  rate  is  0  for  all  two  players. 

2.  je  [().  /)] .  No  matter  how  Blue  allocates  s,  both  Red  teams  will  continue  to  attack 
at  the  maximum  rate  x.  The  total  long-run  reward  rate  for  both  Red  teams  is 

x(l  -  (1  +  r)p1  + 1  -(1  +  r)p2 )  =  x(2  -  (1  +  r)s).  (4) 

Blue’s  long-run  reward  rate  is 

x(-l  +  (1  ~b)pl  - 1  +  (1  -b)p2)  =  x(-2  +  (1  -b)s).  (5) 

3.  s  e(p,2p]  .  In  this  case,  it  is  possible  for  Blue  to  allocate  the  detection 
probability  such  that  it  is  optimal  for  one  Red  team  to  stop  its  operation. 
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The  rest  of  this  section  focuses  on  the  case  when  se(p,2p].  In  particular,  it 
examines  two  strategies  for  Blue:  stationary  allocation  and  dynamic  allocation. 


1.  Stationary  Allocation 


With  a  stationary  allocation,  Blue  assigns  p.  to  surveillance  tower  i,  i  =  1,  2,  on  a 

pennanent  basis.  It  is  reasonable  to  assume  that  each  Red  team  will  discover  this 
allocation  sooner  or  later,  whether  by  intelligence  or  by  computing  its  own  success  rate. 
Without  loss  of  generality,  assume  px>p2.  It  does  not  help  to  set  p{<  p  ,  with  which  the 
optimal  strategy  for  each  Red  team  is  to  attack  at  the  maximum  rate  x.  If  Blue 
sets  py=  p  +  s ,  for  some  s>  0 ,  then  it  is  optimal  for  Red  1  to  cease  the  operation,  and 

for  Red  2  to  attack  at  rate  x. 

Using  Equation  (2),  Red  2's  long-run  reward  rate  is 

x(l  -  (1  +  r)(s  -  p  -  £•))  =  x(2  -  (1  +  r)(s  +  <?)),  (6) 

which  converges  to 

x(2  -  (1  +  r)s)  (7) 

as  s  i  0 . 


Using  Equation  (1),  Blue’s  long-run  reward  rate  is 


f 

f  i  Yi 

S - £ 

x(— 1  +  (1  -b)(s  -  p  —  a))  =  X 

-l+(l-6) 

V 

V  1  +  r  )) 

which  converges  to 


f 

f  1  Y 

X 

-l+(l-6) 

V 

l  1  +  r)J 

as  s  -l  0 . 


(8) 

(9) 


2.  Dynamic  Allocation 

With  a  dynamic  allocation,  Blue  first  assigns  p  to  one  tower  and  s  —  p  to  the 
other  tower,  and  then  swaps  these  allocations  from  time  to  time.  Without  loss  of 
generality,  assume  p  >  s  -  p  .  The  idea  of  dynamic  allocation  is  to  make  p  >  p  so  that 
sometimes  it  is  optimal  for  a  Red  team  to  pause  attacks,  but  each  Red  team  needs  to  guess 
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when  to  resume  attacks.  The  tower  with  detection  probability  s-p  can  be  viewed  as  a 
decoy,  which  may  provide  a  deterrence  effect  if  a  Red  team  does  not  know  that  detection 
probability  has  dropped  from  p  to  s-p . 

Blue  has  two  decision  variables  p  and  y  ,  such  that  Blue  allocates  detection 
probability  p  to  one  tower  and  s-p  to  the  other,  and  swaps  these  allocations  at  a 
Poisson  rate  y  .  Assume  that  the  battle  goes  on  indefinitely,  and  that  over  time  each  Red 
team  leams  about  Blue's  choices  of  p  and  y,  but  does  not  discover  Blue's  real-time 
allocation.  Because  the  two  Red  teams  do  not  interact  with  each  other,  and  because  the 
parameters  are  identical  in  the  two  towns,  from  now  on  the  analysis  will  focus  on  the 
interaction  between  Blue  and  one  Red  team,  henceforth  Red  for  brevity. 

One  feasible  strategy  for  Red  is  to  attack  at  a  Poisson  rate  x .  Alternatively,  Red 
can  set  aside  some  effort  to  learn  about  the  real-time  detection  probability  at  a  Poisson 
rate  z.  Red  can  do  this  by  sending  a  spy,  bribing  Blue’s  people,  or  probing  the  system  in 
some  way.  We  will  impose  a  constraint  that  requires  x  +  az  <  c  ,  where  a  >  0  models  the 
tradeoff  between  the  attack  rate  x  and  the  learning  rate  z,  and  c  is  the  maximum  attack 
rate  if  Red  sets  the  learning  rate  to  0.  With  a  learning  rate  z  >  0,  Red  would  leam  about 
the  detection  probability  at  time  moments  that  constitute  a  Poisson  process  with  rate  z .  In 
other  words,  the  time  between  two  consecutive  learning  points  follows  an  exponential 
distribution  with  rate  z ,  independent  of  everything  else. 

Recall  that  p>  s-  p .  We  say  a  surveillance  tower  is  in  state  1  if  its  detection 
probability  is  p  ,  and  in  state  0  if  its  detection  probability  is  s  -  p  .  In  other  words,  each 
tower  remains  in  state  1  for  a  random  time  that  is  exponentially  distributed  with  mean  \ly, 
and  then  switches  to  state  0  and  stays  in  state  0  for  another  random  time,  which  is  also 
exponentially  distributed  with  mean  My,  and  so  on.  In  the  long  run,  each  tower  will  be  in 
each  state  50%  of  the  time.  We  say  Red  is  in  state  1  if  Red  is  carrying  out  attacks  at  a 
Poisson  rate  x ,  and  in  state  0  if  Red  pauses  its  attacks.  Red  decides  when  it  wants  to 
move  from  one  state  to  the  other. 
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Because  p>  p,  when  Red  learns  the  tower  is  in  state  1,  Red  should  pause  its 
attacks.  Let  Pjk(t)  denote  the  probability  that  the  tower  will  be  in  state  k  after  t  time 
units  if  it  is  currently  in  state  j,  j,  k  =  0, 1 .  Using  the  result  in  Ross  [10],  we  have 


Red  can  compute  the  probability  of  detection  after  t  time  units  once  it  learns  that 
Blue  is  in  state  1,  if  Red  does  not  having  a  learning  point  in  the  next  t  time  units,  as 


Pn(t)-  p  +  Pw(t)-(s-  p)  = 


n  1 

—  +  —  e 
2  2 


-2  yt 


P  + 


n  i 

- e 

2  2 


2yt 


C s-p )• 


Red  should  attack  if  this  detection  probability  is  less  than  p  .  After  some  algebra, 
we  can  show  that  Red  should  wait  for  another 


In 


t  = 


2  p-s 
2p~sj 
2  y 


(10) 


time  units  before  resuming  attacks,  if  Red  does  not  have  another  learning  point  in  this 
time  period.  Consequently,  Red’s  optimal  strategy  takes  the  following  form:  whenever 
Red  leams  that  Blue’s  tower  is  in  state  1,  Red  pauses  its  attacks  until  the  next  learning 
point  or  until  t  time  units  have  elapsed.  If  Red  learns  Blue’s  state  is  0  within  the  next  t 
time  units,  Red  should  resume  attacks  immediately;  if  Red  does  not  have  a  learning  point 
within  the  next  t  time  units,  then  Red  resumes  attacks  after  t  time  units.  With  this 
strategy,  we  can  define  a  renewal  reward  process,  where  a  renewal  is  a  time  moment 
when  Red  learns  that  Blue’s  tower  is  in  state  1.  Figure  3  depicts  this  renewal  reward 
process. 
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Figure  3.  Renewal  reward  process. 

This  diagram  depicts  the  renewal  reward  process  if  Blue  dynamically  allocates  its  resource.  For  Blue, 
each  circle  represents  a  switch  point,  with  solid  lines  being  in  state  1  (detection  probability  p)  and 
dashed  lines  being  in  state  0  (detection  probability  s-p).  For  Red,  solid  lines  indicate  state  1 
(attacking)  and  dashed  lines  indicate  state  0  (not  attacking).  For  the  time  line,  each  square  represents 
a  Red’s  learning  point,  with  a  solid  square  being  a  renewal  (Blue  in  state  1). 

Let  T  denote  the  cycle  time  in  this  renewal  reward  process.  In  addition,  denote  by 
Tk  the  time  until  the  next  renewal  if  Blue’s  current  state  is  k,  k  =  0,1.  To  compute  E\T^\ , 

consider  the  next  event.  If  Blue’s  current  state  is  1,  then  the  next  event  can  either  be 
Blue’s  switch  to  state  0,  or  Red’s  learning  Blue’s  state.  Because  the  time  to  each  event  is 
exponentially  distributed,  the  time  to  either  event,  whichever  occurs  first,  is  also 
exponentially  distributed  with  a  rate  equal  to  the  sum  of  the  two  individual  rates  y  +  z . 
With  probability  y  /  (y  +  z) ,  the  next  event  is  Blue’s  switch  to  state  0,  in  which  case  the 
additional  time  until  a  renewal  is  distributed  as  T0 .  With  probability  z  /  (y  +  z) ,  the  next 

event  is  Red’s  learning  Blue’s  state  to  be  1,  which  constitutes  a  renewal.  Therefore,  we 
can  write 


y+z  y+z 

With  a  similar  argument,  we  can  write 

E[T»\=-+E[Ti\- 

Solving  the  preceding  yields 

E[T,]  =  -, and  E[T0\  =  -  +  ~. 

z  y  z 


12 


By  definition,  T  and  Tx  have  identical  distributions,  so 

Let  X  denote  the  number  of  detected  attacks  in  a  cycle,  and  Y  the  number  of 
undetected  attacks  in  a  cycle.  If  Blue  is  in  state  k  ( k  =  0,  1)  and  Red  is  in  state  1 
(attacking),  then  let  Xk  denote  the  number  of  detected  attacks  until  the  next  renewal,  and 
Yk  the  number  of  undetected  attacks  until  the  next  renewal. 

To  compute  if  ] ,  consider  whether  Blue  switches  to  state  0  first  or  Red  learns 

Blue’s  state  first.  The  time  until  either  event  occurs  follows  an  exponential  distribution 
with  rate  y  +  z  ,  so  the  expected  number  of  detections  during  this  time  period  is 
px/(y  +  z).  Moreover,  with  probability  y/(y  +  z),  Blue  will  switch  to  state  0  first,  in 
which  case  the  additional  number  of  detected  attacks  in  the  cycle  is  distributed  as  X0 . 
With  probability  z/(y  +  z) ,  Red  will  learn  that  Blue  is  in  state  1  first,  which  constitutes  a 
renewal.  Therefore,  we  can  write 

£[+]  =  —  p  +  —  e[xq\ 

y+z  y+z 

With  a  similar  argument,  we  can  write 

E[X„\  =  ^s-p)  +  E[xy 

Solving  from  the  preceding  yields 

E[X1]  =  -s,  and  £[X0l  =  -(s-/?)  +  -s. 
z  y  z 

In  a  similar  way,  we  can  set  up  two  linear  equations  involving  E  [fj  and  E[Y0] 
as  follows: 

£b.]  =  —  (!-/>)+— 

y+z  y+z 
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£[70]  =  ^(l-(s-/>))  +  f'[71]. 

Solving  from  these  two  linear  equations  yields 

£[71]  =  -(2-j),and£[70]  =  -(l-(j-p))  +  -(2-s). 
z  y  z 

Now  we  proceed  to  compute  fs[X]  and  E  [7] .  Let  Z  denote  the  time  of  the  first 
learning  point  after  the  renewal,  which  follows  an  exponential  distribution  with  rate  z . 
To  compute  fi[X],  condition  on  the  event  Z-t.  If  t<t,  then  at  time  t,  either  (1)  the 
cycle  ends  if  Blue  is  in  state  1,  or  (2)  Red  resumes  attacks  (moves  to  state  1)  if  Blue  is  in 
state  0.  If  t  >  t ,  then  Red  resumes  attacks  at  time  t .  Therefore, 


E{X]  =  [j‘Jt)E[X,\:c--’d,  +  e-’'iF],it)E{X]]+l‘nU)E[Xl\\ 

=  /„'(“  e-2"  \eadtE  [X„]  +  e-»  (i  +  i  e-2' !  j  E  [X,  ]  +  e‘{\  -  \  e1*  ]  E  [X„  ] 


2  2 


T-s(l  +  e-’)  -  i— (2p  -  i)(l  -  e-'"2"1''), 
1 z  1 z  +  Iy 


where  t  is  given  in  Equation  (10).  Similarly, 


£[r]  =  iZ«(')£[1'oK'V'+eJ«,(')£[l'i]  +  ^(')£W) 

=  i-(2  -  i)(l  +  o  +  \^r(2p  -J)(l  -  e-<"2'lf). 
2  z  2  z  +  2y 


Red’s  long-run  reward  rate  is  equal  to  (renewal  reward  theory) 

E\Y  1  E\X  1 

R(p,  y,x,z )  =  (+1)  [  j +(-/•)  L  J 
E[T]  E[T] 

x(  -  z  .  h 

=  -  (2  -  (1  +  r),s)(l  +  e~zt )  +  (1  +  r)(2 p-s) - -  e<z+2y)l )  . 

4 1  z  +  2y  J 


Red’s  decision  variables  are  x  and  z,  subject  to  x  +  az  <c  .  Blue’s  long-run  reward  rate 
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(12) 


B(p,y,x,z)  =  (- 1) 


£10 

E[T ] 


+  (-b) 


£[£l 

£[r] 


(-2  +  (1  -  6)s)(l  +  e‘ )  -  (1  -  b)(2p  -  s)  —5— 

z  +  2y 


(1-e 


-(z+2y)t ' 


with  decision  variables  /?andy. 


Remark  1.  An  important  parameter  to  consider  is  the  long-run  proportion  of  time 
when  Red  is  attacking.  From  the  definition  of  the  renewal  process,  at  the  beginning  of 
each  cycle  Red  will  remain  in  state  0  until  either  the  next  learning  point,  or  t ,  whichever 
occurs  first.  In  other  words,  the  amount  of  time  Red  is  in  state  0  in  each  cycle  is 
min (W,t),  where  W  follows  an  exponential  distribution  with  rate  z.  In  each  cycle,  the 
expected  time  that  Red  is  not  attacking  (state  0)  is 

E[mm(W,i)}  =  f  V  ze  zwdw  +  f  “  i  ■  ze2'"dw  =  -(1  -  ezi). 

JO  Ji  2 


Consequently,  the  long-run  proportion  of  time  Red  is  not  attacking  (state  0)  is 


E\mvn(W,i)\  _  1  ^ 

E[T ]  2^  6 


(13) 


The  long-run  proportion  of  time  Red  is  attacking  (state  1 )  is 

i(.  +  eA 


(14) 


Remark  2.  We  assume  that  the  two  Red  teams  are  operating  independently, 
without  any  coordination.  That  is,  when  the  Red  team  in  one  town  leams  the  tower’s 
detection  probability,  it  does  not  give  this  information  to  the  Red  team  in  the  other  town. 
In  the  case  when  the  two  Red  teams  maintain  real-time  communication,  essentially  the 
learning  rate  at  each  town  is  doubled  and  the  same  analysis  applies. 
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B.  RED’S  OPTIMAL  STRATEGY 

When  Blue  dynamically  allocates  its  resources  between  the  two  surveillance 
towers,  each  player  has  two  decision  variables,  as  shown  in  Equations  (11)  and  (12).  In 
this  model,  Blue  moves  first  and  Red  moves  second,  with  each  player  trying  to  maximize 
his  own  long-run  average  reward.  To  compute  this  equilibrium,  we  first  solve  Red's 
optimization  problem  for  given  p  and  y.  Although  Red  has  two  decision  variables,  at 
optimality  the  constraint  x  +  az<c  must  be  equality,  because  R(p,y,x,z )  strictly 
increases  in  x  when  z  is  held  constant.  Substituting  x  =  c-  az  into  Equation  (11),  Red’s 
objective  function  involves  a  single  variable  z  as  follows: 

R(z)  =  - — ^  (l-{l  +  r)s)(\  +  e~z,)  +  (\  +  r)(2p-s)  Z  (l - e~(z+2y)t )  . 

4  v  z  +  ) 

By  letting 

K\  =2-(l  +  r)s, 

K2=(l  +  r)(2P~s)’ 

and  using  Equation  (3)  and  (10)  to  get 

2 _ 

=  (1  +  r)  =  2 p-s  =  e_2yi  <  h 
K2  2  p-s  2  p-s 

we  can  simplify  R(z)  to 
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Proposition  3.  The  function  R(z )  is  concave  in  z. 

Proof:  We  will  show  R”(z )  <  0  to  complete  the  proof.  To  facilitate  the  computation,  let 


g(z)  = 


c-  az 


h(z )  =  e** 


z 

+ - + 

z  +  2y 


-(^+2  y)t 

z  +  2v 


so  R "(z)  =  K2(g "(z)h(z)  +  2 g \z)h  \z)  +  g(z)h "(z)). 
For  g(z),  compute 

g\z)  =  ~<  0.  g"(z)  =  0. 

4 

Taking  the  first  derivative  of  h(z )  yields 


h  \z)  =  — 1  -  e-(z+ly)i  -  (z  +  2  v)ie-(z+2y)i ) 
(z  +  2  v)2  '  ’ 

=  — ( i  _  (l  +  (z  +  2 y )i )) 

(z  +  2y)21  V 

>  — ^—r(  1  -  e-(z+2y>‘  e(z+ly)i )  =  o 

(z  +  2y)2V  ’ 


where  the  inequality  follows  by  letting  A  =  (z  +  2 y)t  >  0  in  the  inequality  1  +  A  <  eA .  In 
addition. 


h\z)  = 


4  y 


(z  +  2y)3 
4  y 


-1  +  e{z+2yYt  +(z  +  2  y)ie~(z+2y)i  +  |((z  +  2  y)tf  e~(z+2y)l 


< 


(z  +  2y)3 
4  v 

(z  +  2  yf 


v 

(  f 

-l  +  e-{z+2y)‘ 

V 


y  ,1  2 

1  +  (z  +  2y)f  +  -((z  +  2y)  t ) 
V  2 


(-> 


_j  _|_  g-(z+2>')':g(z+2j') 


')  =  °, 


where  the  inequality  follows  by  letting  A  =  (z  +  2 y)t  >0  in  the  inequality 
A2 

1  +  A  H —  <  eA .  Consequently,  R  "(z)  <  0 ,  so  R(z)  is  concave  in  z.  ■ 
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Red’s  objective  is  to  choose  ze[0,c/a]  to  maximize  R(z).  Because  R(z )  is 
concave  in  z,  to  maximize  R(z) ,  first  compute 

RX0)  =  ^(-2a-c;)  +  ^(l-e^).  (16) 

Consider  two  cases: 

1 .  R  ’(0)  >  0  :  In  this  case,  it  is  optimal  to  set  z*  =  0 . 

2. R’(0)  0  :  Red  wants  to  maximize  R(z)  for  z  g  .  Because  R(z)  is 

concave  and R\c/a)  <  0  ,  to  maximize  R(z) ,  it  is  equivalent  to  solve R\z)  =  0  .  A 
simple  bisection  algorithm  is  given  below  to  compute  z*  such  thatR’(z*)  =  0 . 
The  constant  8  is  the  error  bound  on  the  solution. 

(a)  Let  a  <—  0  and  b  <—  c/a  . 

(b)  Let  m  <—  (a  +  b)  /  2  ,  and  compute  R \m) . 

(c)  If  R  \m)  =  0  ,  then  z*  =  m  and  exit.  If  R '( m )  >  0  ,  then  let  a  <—  m  ;  if 
R  \m)  <  0 ,  then  let  b  <—  m  . 

(d)  Ifh -  a  >  ^,  go  to  (b);  otherwise  z*  =  a  and  exit. 

C.  BLUE’S  OPTIMAL  STRATEGY 

Denote  the  optimal  learning  rate  derived  from  the  preceding  algorithm  by 
z\p,y)  ,  and  let  x\p,y )  =  c  -  az\p,y ) .  Let 

B{p,  y)  =  B(p,  y,  x  *  (p,  y),  z  *  (p,  y )),  ( 1 7) 

which  Blue  wishes  to  maximize  by  choosing  p  and  y.  To  compute  Blue's  optimal  strategy, 
we  first  plot  B{p,y )  and  observe  that  the  function  is  unimodal  in  each  variable.  We  use 
the  following  algorithm  to  compute  Blue's  optimal  strategy. 

1.  Let  /<— 0,  and  pj  <—  min(l,(s  +  p)/2)  .  Use  the  golden  section  search  to 
compute  y,  <—  arg  max  B( pny  ) 
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2.  Use  the  golden  section  search  to  compute  pM  <—  argma xB(p,yt) . 

P 

3.  Use  the  golden  section  search  to  compute  yM  <—  argmax  B(pM,y)  . 

y 

4.  If B(pM,yM)~  B(pi,yi)  >  8  ,  then  let  i  <—  i  +  1  and  go  to  step  2.  The  parameter 
8  is  the  error  bound. 

5.  Output  y*  =  y  and  p  =  pj+l  as  Blue's  optimal  strategy. 

We  implement  the  preceding  algorithm  in  Microsoft  Excel  using  VBA.  The  end 
result  is  a  decision  aid  that  computes  the  optimal  strategies  for  both  players.  For  more 
details  on  the  decision  aid,  see  Appendix  A. 
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III.  NUMERICAL  ANALYSIS 


The  model  presented  in  this  thesis  has  five  parameters,  namely  r  ,b ,  s  ,  c ,  and  a  . 
The  parameters  r  and  b  model  the  trade-off  between  a  detected  attack  and  an  undetected 
attack  for  Red  and  Blue,  respectively.  The  parameter  s  represents  the  total  resources 
available  to  Blue.  The  parameter  c  represents  the  maximum  effort  Red  can  divide 
between  attacking  and  spying  on  Blue  tower  status.  Without  loss  of  generality,  we  can  set 
c-  1 ,  because  using  another  value  is  equivalent  to  scaling  the  clock  to  a  different  time 
unit.  Finally,  the  parameter  a  models  the  trade-off  between  Red’s  attack  rate  and  its 
learning  rate. 

Intuitively,  a  small  a  implies  that  it  is  easy  to  learn  about  Blue’s  tower  status,  so 
Red  can  set  aside  more  effort  to  attack.  However,  the  effect  of  learning  depends  not  only 
on  Red’s  learning  rate  x  but  also  Blue’s  switch  rate  y  .  Because  Blue  can  set  y  freely,  it 
turns  out  that  the  parameter  a  does  not  have  any  effect  on  the  optimal  solution. 
Mathematically,  rewrite  Equation  (11)  as  R(p,x,y,z,a )  and  Equation  (12)  as 
B(p,x,y,z,a )  to  signify  its  dependence  on  a  ,  and  note  that 

R(p,  y,  x,  z,  a)  =  R(p,  ay,  x,  az,  1) 

B(p,  y,  x,  z,  a)  =  B(p,  ay,  x,  az,  1) 

In  other  words,  if  we  treat  az  (instead  of  z)  as  Red’s  decision  variable  and  ay 
(instead  of  y)  as  Blue’s  decision  variable,  then  we  convert  the  original  problem  to  an 
equivalent  problem  with  <2  =  1.  Consequently,  we  can  also  set  a  =  1  without  loss  of 
generality. 

From  Blue’s  standpoint,  the  optimal  choice  of  y  involves  a  delicate  balance.  If  y 
is  too  small  (say,  once  a  year),  then  Red  can  easily  take  advantage  of  it  by  setting  a 
moderate  learning  rate  without  much  sacrifice  to  its  attack  rate.  If  y  is  too  large  (say, 
once  an  hour),  then  Red  might  as  well  give  up  learning  altogether  and  attack  at  the 
maximum  rate  1,  which  defeats  Blue’s  purpose  of  using  decoy  surveillance  towers.  In 
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other  words,  Blue’s  choice  of  y  needs  to  be  large  enough  to  keep  Red  honest,  and  small 
enough  so  that  Red  has  an  incentive  to  set  aside  some  effort  to  spy  on  Blue’s  operations. 

By  setting  c  =  a  =  1 ,  there  are  three  parameters  we  need  to  consider.  In  Section 
A,  we  set  r  =  4,  b  =  0.5 ,  and  s  =  0.3  as  our  main  example  in  order  to  demonstrate  how  to 
compute  the  optimal  strategy.  In  Section  B,  we  vary  the  parameter  s  in  order  to  show 
how  dynamic  allocation  can  improve  Blue’s  performance  beyond  stationary  allocation. 
Finally  in  Section  C,  we  vary  the  parameters  r  and  b  in  order  to  discuss  some  interesting 
observations. 

A.  MAIN  EXAMPLE 

This  section  demonstrates  how  to  compute  the  optimal  strategies  of  Blue  and  Red 
while  using  dynamic  allocation.  We  consider  a  plausible  scenario  by  setting 
r  =  4,  b  =  0.5 .  Recall  that  the  dynamic  case  applies  when  s  e  (p,  2p] ,  where  p  =  0.2  and 
2p  =  0.4  according  to  Equation  (3).  We  set  5  =  0.3  to  demonstrate  computation  of  the 
optimal  strategy. 

1.  Red's  Optimal  Strategy 

Recall  from  Equation  (15),  Red  has  one  decision  variable,  namely  the  learning 
rate  z.  Red  decides  on  the  value  z  after  finding  out  Blue’s  detection  probability  p  and 
switch  rate  y .  For  example,  if  Blue  sets  p  =  0.3  and  v  =  0.01980,  then  Red  can  use 
Equation  (15)  to  compute  R(z )  .  Figure  4  depicts  the  function  R(z ) ,  which  is  concave  in 
z  as  proved  in  Proposition  3.  We  then  use  bisection  method  to  compute  z*  =  0.13237 , 
and  from  Red’s  constraint  x  +  az  =  c,  x*  =  0.8696  can  be  solved  as  well.  In  other  words, 
with  the  optimal  strategy,  on  average,  Red  will  learn  about  Blue  tower  status  once  every 
7.75  time  units,  and  will  attack  once  every  1.15  time  units. 
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Figure  4.  Red's  optimal  strategy:  Given  Blue’s  strategy  is  p  =  0.3,  y  =  0.0198  ,  Red 

should  use  z*  =  0.13237. 


2.  Blue's  Optimal  Strategy 

Recall  Blue’s  objective  function  B(p, y)  =  B(p,y,x*(p,  v),z*(p,y))  from 
Equation  (17).  For  the  main  example  when  r  =  4,  b  =  0.5,  and  5  =  0.3  ,  we  can  plot 
B(p,y )  ,  which  is  shown  in  Figure  5. 


r  =  4,  b  =  0.5,  s  =  0.3 


Figure  5.  Blue's  objective  function  B(p,y) 
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Although  it  may  be  difficult  to  see,  the  function  B(p,y)  in  Figure  5  is  indeed 
unimodal  in  p  and  in  y  .  Figure  6  shows  the  same  function,  when  one  of  the  variables  is 
fixed.  We  use  the  golden  section  search  method  in  each  dimension  iteratively  to  compute 
p*  and  y* ,  as  discussed  in  Chapter  II,  Section  B.  In  the  main  example,  the  algorithm 

produces  B(p*,y*)  =  - 0.43631  ,  when  Blue  set  p*  =  0.3  and  y*  =  0.0198  .  In  other 
words,  the  model  suggests  that  Blue’s  optimal  strategy  is  to  set  one  tower  with  detection 
probability  p  =  0.3  the  other  with  detection  probability  s-  p  =  0  and,  on  average,  switch 
between  towers  every  50.5  time  units. 


Figure  6.  B(p,y*)  and  B(p*,y)  are  unimodal  in  p  and  y  when 

r  =  4,  b  =  0.5,  s  =  0.3 . 


Although  we  did  not  prove  it  mathematically,  B(p,y)  is  unimodal  in  p  and  in  y 
in  all  the  numerical  experiments  we  conducted.  Another  example  when  the  optimal  p 
does  not  lie  on  the  boundary  is  shown  in  Figure  7.  Consequently,  our  algorithm  in 
Chapter  II  works  well  in  computing  the  optimal  solution. 
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Figure  7.  B(p,y*)  and  B(p*,y )  are  unimodal  in  p  and  y  when 

r  =  1,  b  =  0.1,  5  =  0.9  . 


B.  EFFECTIVENESS  OF  DYNAMIC  ALLOCATION 

This  section  compares  dynamic  allocation  with  stationary  allocation.  Notice  that 
when  using  dynamic  allocation  in  two  towns,  the  long-run  reward  rates  derived  from 
Equation  (1 1)  and  (12)  represent  Red  and  Blue’s  reward  in  one  town,  respectively.  For  a 
fair  comparison  with  stationary  allocation,  we  multiply  these  two  numbers  by  two.  All 
the  numbers  reported  in  the  remainder  of  this  chapter  refer  to  the  total  reward  in  the  two 
towns. 
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In  the  main  example,  we  set  r  =  4,  b  =  0.5 ,  which  yields  p  =  0.2  and  2p  =  0.4  , 
according  to  Equation  (3).  We  consider  three  cases  as  in  Chapter  II,  Section  A. 

First,  in  the  case  s  e  [0  ,  p] ,  Blue’s  long-run  reward  rate  is  Equation  (5),  and 
Red’s  long-run  reward  rate  is  Equation  (4),  as  derived  in  Chapter  II,  Section  A.  As  shown 
in  Figure  8,  when  s  e  [0  ,  0.2] ,  Red’s  long-run  reward  rate  decreased  linearly,  and 
Blue’s  long-run  reward  rate  increases  linearly,  as  Blue’s  resources  increased. 

Second,  in  the  case  se(p  ,  2 p].  Blue  has  two  options  for  resource  allocation, 
either  stationary  allocation  or  dynamic  allocation.  If  Blue  chooses  a  stationary  allocation 
strategy,  we  can  use  Equation  (9)  to  plot  Blue’s  long-run  reward  rate,  and  use  Equation 
(7)  to  plot  Red’s  long-run  reward  rate.  The  results  of  dynamic  allocation  come  from  the 
algorithm  described  in  Chapter  II,  Section  B  and  C.  With  dynamic  allocation  strategy,  we 
can  use  Equation  (12)  to  plot  Blue’s  long-run  reward  rate,  and  use  Equation  (1 1)  to  plot 
Red’s  long-run  reward  rate. 

Third,  in  the  case  of  se(2p,  2]  ,  it  is  trivial  for  Blue  to  allocate 
p1  =  p^  =  s  /  2>  p  .  It  is  optimal  for  both  Red  teams  to  stop  their  operations,  so  the 
payoffs  of  all  players  are  zero.  Therefore,  this  case  is  not  shown  in  Figure  8. 

As  shown  in  Figure  8,  dynamic  allocation  is  better  than  stationary  allocation  for 
Blue.  In  our  main  example,  compared  with  the  stationary  allocation,  the  Blue’s  long-run 
reward  rate  with  dynamic  allocation  improves  from  0.15%  to  28.71%  as  s  increased 
from  0.2  to  0.4. 

With  Blue’s  dynamic  allocation  strategy,  however,  Red’s  long-run  reward  rate 
also  increases.  As  Red  can  learn  the  tower’s  status,  Red  will  attack  when  attacks  are  less 
likely  to  be  detected,  and  will  pause  when  attacks  are  more  likely  to  be  detected. 
Although  Red  has  a  smaller  attack  rate  x  (because  Red  sets  aside  some  effort  on 
learning),  its  attacks  become  more  effective.  Consequently,  Red’s  perfonnance  also 
improves  when  Blue  uses  dynamic  allocation.  In  fact,  Red’s  improvement  as  a  percentage 
to  that  in  stationary  allocation  is  better  than  Blue’s.  This  observation  will  be  examined 
again  in  Section  C  of  this  chapter. 
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Blue’s  use  of  dynamic  allocation  also  affects  Red’s  operations.  In  Figure  9,  we 
plot  the  long-run  proportion  of  time  Red  is  attacking  as  derived  in  Equation  (14).  The 
proportion  of  time  Red  is  attacking  is  higher  than  50%  (stationary  allocation),  and 
increases  when  Blue’s  total  resource  s  increases.  The  long-run  attack  rate,  however,  is 
computed  by  multiplying  Equation  (14)  with  the  instantaneous  attack  rate  x .  As  seen  in 
Figure  10,  Red’s  long-run  attack  rate  is  less  than  50%  (stationary  allocation),  and 
decreases  when  s  increases. 


Figure  9.  Long-run  proportion  of  time  Red  is  attacking  in  dynamic  allocation, 

compared  with  0.5  in  the  case  of  stationary  allocation. 


Figure  10.  Red  long-run  attack  rate  in  dynamic  allocation,  compared  with  0.5  in  the 

case  of  stationary  allocation. 
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C.  DISCUSSIONS 

In  this  section,  we  vary  parameters  b  and  r  to  see  how  they  affect  the  optimal 
strategy.  First,  we  fix  r  =  4,  and  compare  three  value  of  b  =  0.1,  0.5,  0.9,  as  shown  in 
Figure  11. 
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As  shown  in  Figure  11,  qualitatively  the  results  look  about  the  same  when  b 
changes.  The  dynamic  allocation  always  provides  some  benefit  over  stationary  allocation, 
and  the  improvement  is  more  significant  when  s  increases.  Table  2  reports  the  reward 
rate  of  dynamic  allocation  as  percentage  over  that  of  stationary  allocation,  for  Blue  and 
Red,  respectively.  It  shows  that,  for  both  players,  the  improvement  is  more  significant  for 
a  larger  value  of  b .  In  other  words,  dynamic  allocation  is  more  effective  for  a  larger  5 
or  for  a  larger  b  .  Also  seen  in  Table  2,  Red’s  improvement  is  larger  than  Blue’s. 


Table  2.  Improvement  in  dynamic  allocation  as  b  increases 


Next,  we  repeat  the  experiment  for  r  =  1  and  r  =  9 ,  and  plot  the  results  in  Figure 
12  and  Figure  13,  respectively. 
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Figure  13. 


r  =  9,  b  =  0.1 


2.5 

2.0 

a>  1.5 
ro 

a:  i.o 

■a 

|  0.5 

1  o.o 

§  -0.5 

Of 

D>  -1.0 

o 

-J  -1.5 

. 

< 

X 

X 

X 

X 

X 

X 

X 

X 

-2.5 

0.00  0.05  0.10  0.15  0.20 

Blue's  Total  Resource  (Detection  Probability) 


o  Blue  Dynamic 
x  Blue  Stationary 
a  Red  Dynamic 
+  Red  Stationary 


r  =  9,  b  =  0.5 


o  Blue  Dynamic 
x  Blue  Stationary 
a  Red  Dynamic 
+  Red  Stationary 


r  =  9,  b  =  0.9 


o  Blue  Dynamic 
x  Blue  Stationary 
a  Red  Dynamic 
+  Red  Stationary 


Comparison  between  dynamic  allocation  and  stationary  allocation  with 
r  =  9,  h  =  0.1,  0.5,  0.9 


32 


As  shown  in  Figure  12,  there  are  some  cases  where  dynamic  allocation  does  not 
provide  additional  benefits  beyond  those  of  stationary  allocation.  For  instance,  when 
r  =  1,  b  =  0.1,  the  optimal  solution  to  dynamic  allocation  coincides  with  stationary 
allocation,  when  s  is  between  0.5  and  0.8.  We  plot  Red’s  optimal  strategy  in  this  case  in 
Figure  14. 
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Figure  14.  Red’s  optimal  strategy  when  the  optimal  solution  to  dynamic  allocation 

coincides  with  stationary  allocation 


Qualitatively  speaking,  dynamic  allocation  tends  to  be  less  effective  when  r,b, 
and  s  are  small.  Below  we  offer  some  intuitive  explanations.  When  r  is  small  (close  to 
0),  Red  is  not  very  concerned  with  a  detected  attack,  so  Red  has  less  incentive  to  invest  in 
the  learning  rate,  which  makes  dynamic  allocation  less  effective.  When  b  is  large  (close 
to  1),  Blue  does  not  care  much  between  detecting  an  attack  or  not.  Instead,  for  Blue  it  is 
important  to  reduce  Red’s  long-run  attack  rate,  which  can  be  accomplished  by  dynamic 
allocation.  Finally,  when  s  is  small  (close  to  p  ),  Red’s  expected  payoff  for  each  attack 
is  only  slightly  less  than  0,  even  if  the  detection  probability  is  s  .  Therefore,  Red  has  less 
incentive  to  find  out  Blue’s  tower  status,  which  makes  dynamic  allocation  less  effective. 
In  summary,  dynamic  allocation  tends  to  be  more  effective  when  r ,  b ,  and  s  are  larger. 
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IV.  CONCLUSIONS 


This  thesis  examined  how  to  operate  two  surveillance  towers  most  effectively 
with  limited  manpower  (surveillance  resources).  In  particular,  a  dynamic  allocation 
strategy  was  studied,  with  which  the  surveillance  team  is  moved  between  the  two  towers 
intermittently.  Because  it  is  difficult  to  tell  from  the  outside  whether  a  surveillance  tower 
is  fully  functional,  the  understaffed  tower  can  serve  as  a  decoy  to  deter  insurgent 
activities.  The  problem  was  formulated  as  a  two-person  nonzero-sum  game  between  the 
insurgents  and  the  government  forces,  with  the  latter  moving  first.  After  an  algorithm  is 
presented  to  compute  the  equilibrium  in  this  game,  this  study’s  findings  were 
demonstrated  numerically. 

Our  analysis  suggests  that  the  dynamic  allocation  strategy  can  improve  the 
performance  of  surveillance  towers  under  most  circumstances.  The  improvement  tends  to 
be  more  significant  when  government  forces  have  more  surveillance  resources.  Dynamic 
allocation  tends  to  be  less  effective  when  (1)  a  detected  attack  has  a  smaller  negative 
impact  on  insurgent  operations,  or  when  (2)  a  detected  attack  brings  a  larger  immediate 
benefit  to  government  forces.  Our  model  applies  not  only  to  military  operations  but  also 
to  surveillance  problems  in  general. 

There  are  some  limitations  to  our  model.  First,  it  assumes  that  attacks  follow  a 
Poisson  process.  Second,  it  assumes  that  there  is  no  cost  to  switch  the  resource  between 
surveillance  towers.  This  assumption  may  be  reasonable  when  the  videos  from  two 
towers  are  fed  to  a  single  control  room,  but  not  if  there  are  two  separate  control  rooms. 
Third,  the  model  assumes  that  the  detection  probability  increases  linearly  in  the  allocated 
resource. 

There  are  many  possible  future  research  directions.  First,  it  may  be  worthwhile  to 
study  an  asymmetric  model  with  different  parameters  in  the  two  towns.  Second, 
extending  analysis  to  more  than  two  surveillance  towers  can  give  government  forces 
more  flexibility.  Finally,  this  thesis  studies  Stackelberg  equilibrium,  in  which  government 
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forces  move  first  and  insurgents  move  second.  It  is  important  to  study  whether  there  are 
other  types  of  equilibriums,  especially  an  equilibrium  that  results  from  simultaneous 
moves. 
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APPENDIX 


A.  DECISION  AID 

This  Excel  file  implements  the  algorithms  described  in  Chapter  II,  and  consists  of 
six  worksheets.  The  green  cells  require  the  user  to  enter  input  values.  The  yellow  cells  are 
computation  results  from  VBA  codes.  The  red  cells  contain  formulas,  which  should  not 
be  modified  by  the  user.  Below  we  explain  each  worksheet  one  at  a  time. 

1.  Parameter 
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This  worksheet  allows  a  user  to  enter  model  parameters,  namely  r,  b,  a,  c  and  5  . 
The  reward  table  will  be  formulated  and  p  computed.  By  clicking  on  the  button,  the 
program  will  check  the  value  of  s  and  show  the  corresponding  result  as  Figure  15.  In 
addition,  the  default  precision  for  computation  is  1.0E-8,  which  can  be  changed  accoring 
to  the  user’s  desired  calculation  result. 


Allocation  Res 


_xj 


In  this  case,  2pHat  <  s  <=  2,  ->  (2pHat,2] 
Red  should  STOP  attacks  in  both  towns! 
Long-run  reward  rate: 

Red=  0,  Blue=0 


OK 


2<J 


In  this  case,  pHat  <  s  <=  2PHat,  -->  (pHat,2PHat] 
Consider  Stationary  and  Dynamic  allocation  in  next  step! 


OK 


Figure  15.  Allocation  recommendations  for  different  s 


38 


2. 


Blue 


A 

R 

c 

D 

8 

F 

G 

H 

| 

1 

K 

l  M 

N 

1 

pH.it 

Initi.il  Oil  "p" 

* 

Optiin.il  lot  Rlui-  in  Dyiumii  Allot  alinn 

lli-i.ilitin  | 

*  1 

-  1 

■  I 

»  1  *’  I 

"• 

2 

0.25 

P* 

B*<P.V) 

1 

0.250000 

0.006426 

0.921573 

0.078427  -0.452082 

0.313388 

J 

as  O.OlStHIlftSX 

0.X/7A747O7 

7 

0.7M37B 

atK)A47« 

a.sixntin 

0.0K1934  0.449MH 

O.MSftl  / 

4 

Optimal  (or  Ret!  In  Oynamk  Allocation 

3 

0.266326 

0.010437 

0.901380 

0.098620  -0.446068 

0.329380 

5 

Proriunn 

H* 

«* 

«*(m) 

4 

a78S44a 

0.01013/ 

O.OTbb/9 

0.103371  -0.4431V. 

0.3/0397 

6 

0.00000001 

0.867628988 

0.132371012 

0.71904674 

5 

0.288443 

0.016476 

0.878207 

0.121793  -0.439312 

0.349545 

7 

6 

0.300000 

0.0164/6 

0.8/54/6 

0.124524  -0.43/105 

0.369399 

1 

9 

0.300000 

0.019802 

0.86/629 

0.1323/1  -0.436312 

0.359523 

10 

P* 

V* 

l* 

B'(p.y) 

*’(M) 

11 

(Dynamic) 

0.3 

0.019801658 

0.867628988 

0.132371012 

-0.436312101 

0.35952337 

1? 

50.5008775? 

0.87767470? 

0.71904674 

13 

14 

P 

y 

* 

B* 

K* 

15 

p=%  (Dynamic) 

0.019801658 

0.867628988 

0.132371012 

-0.436312101 

0.35952337 

16 

50.50082252 

-0.8/2624202 

0.71904674 

17 

18 

p  pHat 
(Dynamic) 

> 

X 

l 

B* 

R* 

19 

4.19F  05 

1 

0 

0.975 

0.75 

20 

238437702.7 

-14S 

as 

?1 

22 

stationary 

Allocation 

V 

« 

l 

B* 

«• 

a 

as 

24 

75 

■ 

This  worksheet  will  be  activated  if  s  e  (p,2p].  In  this  worksheet,  the  user  needs 
to  enter  a  initial  cut  “p”  {p  e  \p,s~\ )  as  a  starting  point  for  program  to  perfonn  the  golden 
section  search,  as  explained  in  Chapter  II,  Section  C.  The  optimal  strategies  for  Blue  and 
Red  will  then  be  reported.  Also,  the  iteration  results  of  the  golden  section  search  will  be 
listed  for  reference  on  the  right-hand  side.  This  worksheet  implements  the  algorithms  in 
Chapter  II,  Section  C,  “Blue’s  Optimal  Strategy.” 
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3. 


Red 


15  0.00000001 


Red{p,y,x*,z*) 


0.359523 


Blue(p,y,x*,z*) 


-0.436312 


Back  to  Parameter 


cr 


0.0 


p  =  0.3,  y  =  0.0198 


E 
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Number  of  Z 
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R'(z) 

R(z) 

Re-Plot 
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0.00000 
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28 
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H 

This  worksheet  plots  Red’s  objective  function  R(z )  and  computes  Red’s  optimal 
strategy,  z*  and  x* ,  as  explained  in  Chapter  II,  Section  B.  Blue’s  strategy  p  and  y  are 
required  inputs.  This  worksheet  implements  the  algorithm  in  Chapter  II,  Section  B, 
“Red’s  Optimal  Strategy.” 
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By  holding  p  constant,  this  worksheet  plots  B(p,y )  as  a  function  of  y  ,  and 
computes  v  that  maximizes  it.  It  implements  the  golden  section  search  method.  The 
value  of  p  is  a  required  input.  This  worksheet  implements  the  algorithm  in  Chapter  II, 
Section  C,  “Blue’s  Optimal  Strategy.” 
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FindP 


By  holding  y  constant,  this  worksheet  plots  B(p,y )  as  a  function  of  p  ,  and 
computes  p  that  maximizes  it.  It  implements  the  golden  section  search  method.  The 
value  of  y  is  a  required  input.  This  worksheet  implements  the  algorithm  in  Chapter  II, 
Section  C,  “Blue’s  Optimal  Strategy.” 
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This  worksheet  shows  —given  r,  b,  a,  c  —  the  comparison  of  dynamic  and 
stationary  allocations  using  different  values  of  s  .  In  addition  to  numbers,  it  shows  a  plot 
to  compare  the  two  strategies. 


43 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


44 


LIST  OF  REFERENCES 


[1]  Joint  chiefs  of  staff,  DoD,  U.S.,  Department  of  Defense  Dictionary  of  Military 
and  Associated  Terms  (Joint  Pub  1-02).  2009. 

[2]  D.  M.  Bacon.  (2007,  July  12).  Corps  implements  visual  monitoring  technology  for 
gaming  generation  [Online]. 

Available:http://www. demilitary. com/stories/071207/quanticosentry_27963.shtml 

[3]  R.  Parasuraman  and  M.  Mouloua,  Eds.,  Automation  and  Human  Performance: 
Theory  and  Applications.  Mahwah,  N.J.:  Lawrence  Erlbaum  Associate,  1996. 

[4]  FLIR  Systems.  (2010,  Jan.  15)  FLIR  Land  System  Brochure  [Online].  Available: 
www.gs.flir.com/uploadedFiles/GS/datasheets/Land_Brochure.pdf 

[5]  R.  Szechtman,  M.  Kress,  K.  Y.  Lin  and  D.  Cfir,  "Models  of  sensor  operations  for 
border  surveillance  "  Naval  Research  Logistics,  vol.  55,  pp.  27-41,  2008. 

[6]  W.  D.  Midgette,  "Enhancing  the  operational  effectiveness  of  the  ground-based 
operational  surveillance  system  (G-BOSS)."  M.S.  thesis,  O.R.  Dept.,  Naval 
Postgraduate  School,  Monterey,  CA,  2008. 

[7]  E.  O.  Williams,  "Surveillance  and  Interdiction  Models:  A  Game-Theoretic 
Approach  to  Defend  Against  VBIEDs."  M.S.  thesis,  O.R.  Dept.,  M.S.  thesis,  OR 
Dept.,  Naval  Postgraduate  School,  Monterey,  CA,  2010. 

[8]  M.  Gill  and  A.  Spriggs.  "Assessing  the  Impact  of  CCTV,"  Home  Office  Research, 
Development  and  Statistics  Directorate,  2005. 

[9]  B.  C.  Welsh  and  D.  P.  Farrington,  "Crime  prevention  effects  of  closed  circuit 
television:  A  systematic  review,"  Home  Office  Research,  Development  and 
Statistics  Directorate,  2002. 

[10]  S.  M.  Ross,  Stochastic  Processes.  2nd  ed.  New  York:  Wiley,  1995. 


45 


THIS  PAGE  INTENTIONALLY  LEFT  BLANK 


46 


INITIAL  DISTRIBUTION  LIST 


1.  Defense  Technical  Infonnation  Center 
Ft.  Belvoir,  Virginia 

2.  Dudley  Knox  Library 
Naval  Postgraduate  School 
Monterey,  California 

3.  Kyle  Y.  Lin 

Naval  Postgraduate  School 
Monterey,  California 

4.  Douglas  R.  Burton 
Naval  Postgraduate  School 
Monterey,  California 

5.  Che-Shiung  Lin 

Naval  Postgraduate  School 
Monterey,  California 


47 


